VoiceLink tutorial
This project utilizes the ESP32S3 as an AI processor for voice command recognition. The following tutorial
will guide you through the step-by-step implementation of the system.
Part 1: Data Collection
In this section, we will record raw audio data from the ESP32 to build our dataset for training the AI model.
Step 1: Upload Firmware
We need to flash the ESP32 with code that reads audio from the microphone and sends it over Serial.
- Open your project in the PlatformIO environment.
- Locate the file named
dataCollection.cpp in your file explorer.
- Copy all the content from
dataCollection.cpp.
- Open
main.cpp (located in the src folder).
- Replace the entire content of
main.cpp with the code you just copied.
- Uncomment the code if necessary.
- Connect your ESP32-S3 board to your computer via USB.
- Click the Upload button (Right arrow icon) in PlatformIO.
Step 2: Configure Python Script
Prepare the Python script that captures audio from the USB port.
- Check Device Manager to find your ESP32's COM Port number (e.g., COM3, COM9).
- Open
collectorSerial.py.
- Find the line defining
COMPORT and update it.
# ----------------------------------------------- //
# // // REPLACE WITH YOUR ESP32's port
COMPORT = 'COM9' # Update to your ESP32's port
#----------------------------------------------- //
Step 3: Record Your First Class
We will now record the keywords.
- Run the script:
python collectorSerial.py
- Enter the label you want to record (e.g.,
on, off, or creative names like
haha, hehe).
- When you see
[READY] Press SPACE to record..., press SPACE and speak
clearly into the mic.
Step 4: Repeat & Adjust
Repeat Step 3 for your second keyword and for background noise.
- To change recording duration, adjust this line in the Python script:
#--------------------------------------------- //
RECORD_SECONDS = 60 # Change this for longer dataset
#-------------------------------------------- //
Part 2: Uploading & Splitting
Now we upload the raw audio files to Edge Impulse and chop them into 1-second samples.
Step 1: Upload Data
- Go to the Data acquisition tab in Edge Impulse.
- Click Collect new data -> Add data -> Upload
data.
- Select your recorded file.
- Settings:
- Method: "Automatically split between training and testing".
- Label: Enter the label matching your file (e.g., 'on').
- Click Upload data.
Step 2: Split Samples
Since we recorded one long file, we need to split it into individual keywords.
- Find your uploaded file in the list.
- Click the three dots (⋮) -> Split sample.
- The app will auto-detect segments. Validate that the boxes cover the audio correctly.
- Click Split.
Step 3: Repeat & Finalize
- Repeat splitting for your other keyword file.
- For the Noise/Background file: Do NOT split the sample.
Keep it continuous.
- Go to the Dashboard tab, scroll down, and click Perform train / test
split.
Part 3: Training the Model
We will design the processing pipeline and train the Neural Network.
Step 1: Create Impulse
Navigate to the Create Impulse tab.
- Time Series Data:
- Window size:
1000 ms.
- Window increase:
500 ms.
- Zero-pad data: Unchecked.
⚠️ CRITICAL: Frequency Setting
This must match your hardware exactly. If your data says 6392Hz, type 6392 here.
Do not use 16000Hz unless the data is actually 16000Hz.
- Processing Block: Add "Audio (MFCC)".
- Learning Block: Add "Classification (Keras)".
- Click Save Impulse.
Step 2: Generate Features (MFCC)
- Go to the MFCC tab.
- Click Save parameters (Defaults are usually fine).
- Click Generate features.
- Visual Check: Look at the "Feature Explorer" graph.
- Good: Distinct clusters of colors.
- Bad: Dots mixed together like a smoothie.
Step 3: Classifier (Training)
Go to the Classifier tab to train the brain.
- Settings:
- Training cycles: Set to 60.
- Learning rate:
0.005.
- Data augmentation: CHECKED.
- Click Save & train.
Step 4: The Result
Wait for training to finish and examine the Confusion Matrix.
- Goal: >85% accuracy for Keywords, >90% for Noise.
Part 4: Deployment (The Transplant)
Goal: Move the "brain" onto the chip for offline use.
Step 1: Export the Library
- Go to the Deployment tab in Edge Impulse.
- Search for Arduino Library.
- Select TensorFlow Lite as Deployment engine.
- Select Quantized (int8). (Critical for ESP32
performance).
- Click Build to download the
.zip file.
Step 2: Install in PlatformIO
- Unzip the downloaded file on your computer.
- Drag & Drop: Move the extracted folder (e.g.,
ESP_VOICE_inferencing) into
your PlatformIO project's lib/ folder.
Structure Check: Verify your project folder looks exactly like this:
MyProject/
├── lib/
│ └── ESP_VOICE_inferencing/ <-- The folder you just added
├── src/
│ └── main.cpp
│ └── modelRun.cpp
└── platformio.ini
Step 3: The Final Code
Now we replace the data collection script with the actual AI logic.
- Locate the file named
modelRun.cpp in your file explorer.
- Copy all the content from
modelRun.cpp.
- Open
main.cpp (located in the src folder).
- Replace the entire content of
main.cpp with the code you just copied.
- Uncomment the code if necessary.
- Important: Update the library include line at the top of the code to match your folder
name:
// ----------------------------------------- //
// REPLACE WITH YOUR LIBRARY
#include "quocanmeomeo-project-1_inferencing.h"
// ----------------------------------------- //
Step 4: Upload & Test
- Connect your ESP32.
- Click Upload in PlatformIO.
- Open the Serial Monitor and start speaking your keywords!